08. Optimizing For Inference

Once the graph is frozen there are a variety of transformations that can be performed; dependent on what we wish to achieve. TensorFlow has packaged up some inference optimizations in a tool aptly called optimize_for_inference .

optimize_for_inference does the following:

  • Removes training-specific and debug-specific nodes
  • Fuses common operations
  • Removes entire sections of the graph that are never reached

Here’s how it can be used:

~/tensorflow/bazel-bin/tensorflow/python/tools/optimize_for_inference \
--input=frozen_graph.pb \
--output=optimized_graph.pb \
--frozen_graph=True \
--input_names=image_input \
--output_names=Softmax

We'll use the graph we just froze as the input graph, input . output is the name of the output graph; we’ll be creative and call it optimized_graph.pb .

The optimize_for_inference tool works for both frozen and unfrozen graphs, so we have to specify whether the graph is already frozen or not with frozen_graph .

input_names and output_names are the names of the input and output nodes respectively. As the option names suggest, there can be more than one input or output node, separated by commas.

Let’s take a look at the effect this has on the number of operations.

from graph_utils import load_graph

sess, optimized_ops = load_graph('optimized_graph.pb')
print(len(optimized_ops)) # 200

Cool, Now there are only 200 operations in the graph!